Automated production of true-cased punctuated subtitles for weather and news broadcasts

نویسندگان

  • Joris Driesen
  • Alexandra Birch
  • Simon Grimsey
  • Saeid Safarfashandi
  • Juliet Gauthier
  • Matt Simpson
  • Steve Renals
چکیده

Providing subtitling for multimedia content is a highly costly process. Any system aimed at automating at least part of this process may therefore yield significant economic benefits for content providers. In this paper, we present an integrated automatic system capable of automatically subtitling weather forecasts and news broadcasts. In this system, a number of different modules are stringed together, each performing a single processing step in the pipeline. An ASR (Automatic Speech Recognition) module first converts raw audio into an uninterrupted stream of written words. A decision tree classifier then marks sentence boundaries in the resulting word sequence. Finally, a SMT (Statistical Machine Translation) module ‘translates’ the resulting sentences into punctuated true-cased text. The system has been developed in close cooperation with Red Bee Media and will be deployed in their commercial production pipeline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The CMU-UKA syntax augmented machine translation system for IWSLT-06

We present the CMU-UKA Syntax Augmented Machine Translation System that was used in the IWSLT-06 evaluation campaign. We participated in the C-Star data track using only the Full BTEC corpus, for Chinese-English translation, focusing on transcript translation. We applied techniques that produce true-cased, punctuated translations from non-punctuated Chinese transcripts, generating translations ...

متن کامل

Real-time recognition of broadcast news

Although the performance of state-of-the-art automatic speech recognition systems on the challenging task of broadcast news transcription has improved considerably in recent years, many of the systems operate in 130-300 times real-time [1]. Many applications of automatic transcription of broadcast news, eg. closedcaption subtitles for television broadcasts, require real-time operation. This pap...

متن کامل

Employing signed TV broadcasts for automated learning of British Sign Language

We present several contributions towards automatic recognition of BSL signs from continuous signing video sequences: (i) automatic detection and tracking of the hands using a generative model of the image; (ii) automatic learning of signs from TV broadcasts of single signers, using only the supervisory information available from subtitles; (iii) discriminative signer-independent sign recognitio...

متن کامل

Reduction of Dutch Sentences for Automatic Subtitling

We compare machine learning approaches for sentence length reduction for automatic generation of subtitles for deaf and hearing-impaired people with a method which relies on hand-crafted deletion rules. We describe building the necessary resources for this task: a parallel corpus of examples of news broadcasts of the Flemish VRT broadcasting corporation, and a Dutch shallow parser based on the ...

متن کامل

A Web Classifier for Semantic Classification Between News and Sports Broadcasts

Lately, lot of work has been done in the area of content-based audio classification. In this paper we experiment on audio classification between sports and news broadcasts using the Average Magnitude Difference Function as the feature extractor and an LVQ1 neural network as classifier. The method proves robust and the results are reliable and could be further utilized in an automated web classi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014